AAAI.2024 - AI for Social Impact Track | Cool Papers

#1 BirdCollect: A Comprehensive Benchmark for Analyzing Dense Bird Flock Attributes [PDF] [Copy] [Kimi]

Authors: Kshitiz . ; Sonu Shreshtha ; Bikash Dutta ; Muskan Dosi ; Mayank Vatsa ; Richa Singh ; Saket Anand ; Sudeep Sarkar ; Sevaram Mali Parihar

Automatic recognition of bird behavior from long-term, un controlled outdoor imagery can contribute to conservation efforts by enabling large-scale monitoring of bird populations. Current techniques in AI-based wildlife monitoring have focused on short-term tracking and monitoring birds individually rather than in species-rich flocks. We present Bird-Collect, a comprehensive benchmark dataset for monitoring dense bird flock attributes. It includes a unique collection of more than 6,000 high-resolution images of Demoiselle Cranes (Anthropoides virgo) feeding and nesting in the vicinity of Khichan region of Rajasthan. Particularly, each image contains an average of 190 individual birds, illustrating the complex dynamics of densely populated bird flocks on a scale that has not previously been studied. In addition, a total of 433 distinct pictures captured at Keoladeo National Park, Bharatpur provide a comprehensive representation of 34 distinct bird species belonging to various taxonomic groups. These images offer details into the diversity and the behaviour of birds in vital natural ecosystem along the migratory flyways. Additionally, we provide a set of 2,500 point-annotated samples which serve as ground truth for benchmarking various computer vision tasks like crowd counting, density estimation, segmentation, and species classification. The benchmark performance for these tasks highlight the need for tailored approaches for specific wildlife applications, which include varied conditions including views, illumination, and resolutions. With around 46.2 GBs in size encompassing data collected from two distinct nesting ground sets, it is the largest birds dataset containing detailed annotations, showcasing a substantial leap in bird research possibilities. We intend to publicly release the dataset to the research community. The database is available at: https://iab-rubric.org/resources/wildlife-dataset/birdcollect

#2 A Bayesian Spatial Model to Correct Under-Reporting in Urban Crowdsourcing [PDF] [Copy] [Kimi]

Authors: Gabriel Agostini ; Emma Pierson ; Nikhil Garg

Decision-makers often observe the occurrence of events through a reporting process. City governments, for example, rely on resident reports to find and then resolve urban infrastructural problems such as fallen street trees, flooded basements, or rat infestations. Without additional assumptions, there is no way to distinguish events that occur but are not reported from events that truly did not occur--a fundamental problem in settings with positive-unlabeled data. Because disparities in reporting rates correlate with resident demographics, addressing incidents only on the basis of reports leads to systematic neglect in neighborhoods that are less likely to report events. We show how to overcome this challenge by leveraging the fact that events are spatially correlated. Our framework uses a Bayesian spatial latent variable model to infer event occurrence probabilities and applies it to storm-induced flooding reports in New York City, further pooling results across multiple storms. We show that a model accounting for under-reporting and spatial correlation predicts future reports more accurately than other models, and further induces a more equitable set of inspections: its allocations better reflect the population and provide equitable service to non-white, less traditionally educated, and lower-income residents. This finding reflects heterogeneous reporting behavior learned by the model: reporting rates are higher in Census tracts with higher populations, proportions of white residents, and proportions of owner-occupied households. Our work lays the groundwork for more equitable proactive government services, even with disparate reporting behavior.

#3 Automatic Interpretation of Line Probe Assay Test for Tuberculosis [PDF] [Copy] [Kimi]

Authors: Jatin Agrawal ; Mukul Kumar ; Avtansh Tiwari ; Sachin Danisetty ; Soma Dhavala ; Nakul Jain ; Prasaanth Balraj ; Niket Singh ; Siddhant Shingi ; Jayakrishna Kurada ; Raghuram Rao ; S Anand ; Nishant Kumar

Line Probe Assay (LPA) is a widely used method for diagnosing drug-resistant tuberculosis (DRTB), but it is a time-consuming and labor-intensive process that requires expert interpretation. DRTB is a significant threat to global TB control efforts and its prompt diagnosis is critical for initiating appropriate treatment. In this paper, we present an automated LPA test interpretation solution that uses computer vision techniques to extract and analyze strips from LPA sheets and uses machine learning algorithms to produce drug sensitivity and resistivity outcomes with extremely high precision and recall. We also develop OCR models to eliminate manual data entry to further reduce the overall time. Our solution comprises a rejection module that flags ambiguous and novel samples that are then referred to experienced lab technicians. This results in increased trust in the solution. To evaluate our solution, we curate an extensive and diverse dataset of LPA strips annotated by multiple microbiologists across India. Our solution achieves more than 95% accuracy for all drugs on this dataset. The proposed solution has the potential to increase the efficiency, standardization of LPA test interpretation, and fast-tracking the dissemination of results to end-users via a designated Management Information System (MIS).

#4 Physics-Informed Graph Neural Networks for Water Distribution Systems [PDF] [Copy] [Kimi]

Authors: Inaam Ashraf ; Janine Strotherm ; Luca Hermes ; Barbara Hammer

Water distribution systems (WDS) are an integral part of critical infrastructure which is pivotal to urban development. As 70% of the world's population will likely live in urban environments in 2050, efficient simulation and planning tools for WDS play a crucial role in reaching UN's sustainable developmental goal (SDG) 6 - "Clean water and sanitation for all". In this realm, we propose a novel and efficient machine learning emulator, more precisely, a physics-informed deep learning (DL) model, for hydraulic state estimation in WDS. Using a recursive approach, our model only needs a few graph convolutional neural network (GCN) layers and employs an innovative algorithm based on message passing. Unlike conventional machine learning tasks, the model uses hydraulic principles to infer two additional hydraulic state features in the process of reconstructing the available ground truth feature in an unsupervised manner. To the best of our knowledge, this is the first DL approach to emulate the popular hydraulic simulator EPANET, utilizing no additional information. Like most DL models and unlike the hydraulic simulator, our model demonstrates vastly faster emulation times that do not increase drastically with the size of the WDS. Moreover, we achieve high accuracy on the ground truth and very similar results compared to the hydraulic simulator as demonstrated through experiments on five real-world WDS datasets.

#5 Quantile-Regression-Ensemble: A Deep Learning Algorithm for Downscaling Extreme Precipitation [PDF] [Copy] [Kimi]

Authors: Thomas Bailie ; Yun Sing Koh ; Neelesh Rampal ; Peter B. Gibson

Global Climate Models (GCMs) simulate low resolution climate projections on a global scale. The native resolution of GCMs is generally too low for societal-level decision-making. To enhance the spatial resolution, downscaling is often applied to GCM output. Statistical downscaling techniques, in particular, are well-established as a cost-effective approach. They require significantly less computational time than physics-based dynamical downscaling. In recent years, deep learning has gained prominence in statistical downscaling, demonstrating significantly lower error rates compared to traditional statistical methods. However, a drawback of regression-based deep learning techniques is their tendency to overfit to the mean sample intensity. Extreme values as a result are often underestimated. Problematically, extreme events have the largest societal impact. We propose Quantile-Regression-Ensemble (QRE), an innovative deep learning algorithm inspired by boosting methods. Its primary objective is to avoid trade-offs between fitting to sample means and extreme values by training independent models on a partitioned dataset. Our QRE is robust to redundant models and not susceptible to explosive ensemble weights, ensuring a reliable training process. QRE achieves lower Mean Squared Error (MSE) compared to various baseline models. In particular, our algorithm has a lower error for high-intensity precipitation events over New Zealand, highlighting the ability to represent extreme events accurately.

#6 Early Detection of Extreme Storm Tide Events Using Multimodal Data Processing [PDF] [Copy] [Kimi]

Authors: Marcel Barros ; Andressa Pinto ; Andres Monroy ; Felipe Moreno ; Jefferson Coelho ; Aldomar Pietro Silva ; Caio Fabricio Deberaldini Netto ; José Roberto Leite ; Marlon Mathias ; Eduardo Tannuri ; Artur Jordao ; Edson Gomi ; Fabio Cozman ; Marcelo Dottori ; Anna Helena Reali Costa

Sea-level rise is a well-known consequence of climate change. Several studies have estimated the social and economic impact of the increase in extreme flooding. An efficient way to mitigate its consequences is the development of a flood alert and prediction system, based on high-resolution numerical models and robust sensing networks. However, current models use various simplifying assumptions that compromise accuracy to ensure solvability within a reasonable timeframe, hindering more regular and cost-effective forecasts for various locations along the shoreline. To address these issues, this work proposes a hybrid model for multimodal data processing that combines physics-based numerical simulations, data obtained from a network of sensors, and satellite images to provide refined wave and sea-surface height forecasts, with real results obtained in a critical location within the Port of Santos (the largest port in Latin America). Our approach exhibits faster convergence than data-driven models while achieving more accurate predictions. Moreover, the model handles irregularly sampled time series and missing data without the need for complex preprocessing mechanisms or data imputation while keeping low computational costs through a combination of time encoding, recurrent and graph neural networks. Enabling raw sensor data to be easily combined with existing physics-based models opens up new possibilities for accurate extreme storm tide events forecast systems that enhance community safety and aid policymakers in their decision-making processes.

#7 Decision-Making for Land Conservation: A Derivative-Free Optimization Framework with Nonlinear Inputs [PDF] [Copy] [Kimi]

Authors: Cassidy K. Buhler ; Hande Y. Benson

Protected areas (PAs) are designated spaces where human activities are restricted to preserve critical habitats. Decision-makers are challenged with balancing a trade-off of financial feasibility with ecological benefit when establishing PAs. Given the long-term ramifications of these decisions and the constantly shifting environment, it is crucial that PAs are carefully selected with long-term viability in mind. Using AI tools like simulation and optimization is common for designating PAs, but current decision models are primarily linear. In this paper, we propose a derivative-free optimization framework paired with a nonlinear component, population viability analysis (PVA). Formulated as a mixed integer nonlinear programming (MINLP) problem, our model allows for linear and nonlinear inputs. Connectivity, competition, crowding, and other similar concerns are handled by the PVA software, rather than expressed as constraints of the optimization model. In addition, we present numerical results that serve as a proof of concept, showing our models yield PAs with similar expected risk to that of preserving every parcel in a habitat, but at a significantly lower cost. The overall goal is to promote interdisciplinary work by providing a new mathematical programming tool for conservationists that allows for nonlinear inputs and can be paired with existing ecological software. The code and data are available at https://github.com/cassiebuhler/conservation-dfo.

#8 CariesXrays: Enhancing Caries Detection in Hospital-Scale Panoramic Dental X-rays via Feature Pyramid Contrastive Learning [PDF] [Copy] [Kimi]

Authors: Bingzhi Chen ; Sisi Fu ; Yishu Liu ; Jiahui Pan ; Guangming Lu ; Zheng Zhang

Dental caries has been widely recognized as one of the most prevalent chronic diseases in the field of public health. Despite advancements in automated diagnosis across various medical domains, it remains a substantial challenge for dental caries detection due to its inherent variability and intricacies. To bridge this gap, we release a hospital-scale panoramic dental X-ray benchmark, namely “CariesXrays”, to facilitate the advancements in high-precision computer-aided diagnosis for dental caries. It comprises 6,000 panoramic dental X-ray images, with a total of 13,783 instances of dental caries, all meticulously annotated by dental professionals. In this paper, we propose a novel Feature Pyramid Contrastive Learning (FPCL) framework, that jointly incorporates feature pyramid learning and contrastive learning within a unified diagnostic paradigm for automated dental caries detection. Specifically, a robust dual-directional feature pyramid network (D2D-FPN) is designed to adaptively capture rich and informative contextual information from multi-level feature maps, thus enhancing the generalization ability of caries detection across different scales. Furthermore, our model is augmented with an effective proposals-prototype contrastive regularization learning (P2P-CRL) mechanism, which can flexibly bridge the semantic gaps among diverse dental caries with varying appearances, resulting in high-quality dental caries proposals. Extensive experiments on our newly-established CariesXrays benchmark demonstrate the potential of FPCL to make a significant social impact on caries diagnosis.

#9 Referee-Meta-Learning for Fast Adaptation of Locational Fairness [PDF] [Copy] [Kimi]

Authors: Weiye Chen ; Yiqun Xie ; Xiaowei Jia ; Erhu He ; Han Bao ; Bang An ; Xun Zhou

When dealing with data from distinct locations, machine learning algorithms tend to demonstrate an implicit preference of some locations over the others, which constitutes biases that sabotage the spatial fairness of the algorithm. This unfairness can easily introduce biases in subsequent decision-making given broad adoptions of learning-based solutions in practice. However, locational biases in AI are largely understudied. To mitigate biases over locations, we propose a locational meta-referee (Meta-Ref) to oversee the few-shot meta-training and meta-testing of a deep neural network. Meta-Ref dynamically adjusts the learning rates for training samples of given locations to advocate a fair performance across locations, through an explicit consideration of locational biases and the characteristics of input data. We present a three-phase training framework to learn both a meta-learning-based predictor and an integrated Meta-Ref that governs the fairness of the model. Once trained with a distribution of spatial tasks, Meta-Ref is applied to samples from new spatial tasks (i.e., regions outside the training area) to promote fairness during the fine-tune step. We carried out experiments with two case studies on crop monitoring and transportation safety, which show Meta-Ref can improve locational fairness while keeping the overall prediction quality at a similar level.

#10 From Artificially Real to Real: Leveraging Pseudo Data from Large Language Models for Low-Resource Molecule Discovery [PDF] [Copy] [Kimi]

Authors: Yuhan Chen ; Nuwa Xi ; Yanrui Du ; Haochun Wang ; Jianyu Chen ; Sendong Zhao ; Bing Qin

Molecule discovery serves as a cornerstone in numerous scientific domains, fueling the development of new materials and innovative drug designs. Recent developments of in-silico molecule discovery have highlighted the promising results of cross-modal techniques, which bridge molecular structures with their descriptive annotations. However, these cross-modal methods frequently encounter the issue of data scarcity, hampering their performance and application. In this paper, we address the low-resource challenge by utilizing artificially-real data generated by Large Language Models (LLMs). We first introduce a retrieval-based prompting strategy to construct high-quality pseudo data, then explore the optimal method to effectively leverage this pseudo data. Experiments show that using pseudo data for domain adaptation outperforms all existing methods, while also requiring a smaller model scale, reduced data size and lower training cost, highlighting its efficiency. Furthermore, our method shows a sustained improvement as the volume of pseudo data increases, revealing the great potential of pseudo data in advancing low-resource cross-modal molecule discovery.

#11 Auto311: A Confidence-Guided Automated System for Non-emergency Calls [PDF] [Copy] [Kimi]

Authors: Zirong Chen ; Xutong Sun ; Yuanhe Li ; Meiyi Ma

Emergency and non-emergency response systems are essential services provided by local governments and critical to protecting lives, the environment, and property. The effective handling of (non-)emergency calls is critical for public safety and well-being. By reducing the burden through non-emergency callers, residents in critical need of assistance through 911 will receive a fast and effective response. Collaborating with the Department of Emergency Communications (DEC) in Nashville, we analyzed 11,796 non-emergency call recordings and developed Auto311, the first automated system to handle 311 non-emergency calls, which (1) effectively and dynamically predicts ongoing non-emergency incident types to generate tailored case reports during the call; (2) itemizes essential information from dialogue contexts to complete the generated reports; and (3) strategically structures system-caller dialogues with optimized confidence. We used real-world data to evaluate the system's effectiveness and deployability. The experimental results indicate that the system effectively predicts incident type with an average F-1 score of 92.54%. Moreover, the system successfully itemizes critical information from relevant contexts to complete reports, evincing a 0.93 average consistency score compared to the ground truth. Additionally, emulations demonstrate that the system effectively decreases conversation turns as the utterance size gets more extensive and categorizes the ongoing call with 94.49% mean accuracy.

#12 Blind-Touch: Homomorphic Encryption-Based Distributed Neural Network Inference for Privacy-Preserving Fingerprint Authentication [PDF] [Copy] [Kimi]

Authors: Hyunmin Choi ; Simon S. Woo ; Hyoungshick Kim

Fingerprint authentication is a popular security mechanism for smartphones and laptops. However, its adoption in web and cloud environments has been limited due to privacy concerns over storing and processing biometric data on servers. This paper introduces Blind-Touch, a novel machine learning-based fingerprint authentication system leveraging homomorphic encryption to address these privacy concerns. Homomorphic encryption allows computations on encrypted data without decrypting. Thus, Blind-Touch can keep fingerprint data encrypted on the server while performing machine learning operations. Blind-Touch combines three strategies to efficiently utilize homomorphic encryption in machine learning: (1) It optimizes the feature vector for a distributed architecture, processing the first fully connected layer (FC-16) in plaintext on the client side and the subsequent layer (FC-1) post-encryption on the server, thereby minimizing encrypted computations; (2) It employs a homomorphic encryption-compatible data compression technique capable of handling 8,192 authentication results concurrently; and (3) It utilizes a clustered server architecture to simultaneously process authentication results, thereby enhancing scalability with increasing user numbers. Blind-Touch achieves high accuracy on two benchmark fingerprint datasets, with a 93.6% F1- score for the PolyU dataset and a 98.2% F1-score for the SOKOTO dataset. Moreover, Blind-Touch can match a fingerprint among 5,000 in about 0.65 seconds. With its privacy-focused design, high accuracy, and efficiency, Blind-Touch is a promising alternative to conventional fingerprint authentication for web and cloud applications.

#13 Identifying Guarantors of War Veterans Using Robust-SEAL: A Case of the Korean War [PDF] [Copy] [Kimi¹]

Authors: Jong in Choi ; Won Kyung Lee ; Jae Hwan Lee ; So Young Sohn

Most countries provide veterans with various benefits to reward their sacrifice. Unfortunately, many veterans have failed to prove their status due to loss of military records. Thus, some governments allow the verification of those veterans through "buddy statements" obtained from the people who can vouch for the buddy's participation in the war. However, it is still challenging for veterans to find guarantors directly. With this background, we suggest to utilizing historical war records of combined operations to increase the pool of potential guarantors for the buddy statements. However, a combined operation network among troops can have missing edges and perturbations on attributes of the troop due to inaccurate information. In this study, we learn from some recorded interactions which might be incomplete and noisy, and predict missing linkages among the troops that might have interacted together in the war, by proposing Robust-SEAL (learning from Subgraphs, Embeddings, and Attributes for Link prediction). It combines two Graph Neural Network (GNN) architectures: robust Graph Convolutional Network which considers the uncertainty of node attributes with a probabilistic approach, and SEAL which improves the expressive power of the GNN with a labeling trick. Our proposed approach was applied to Korean War data with perturbations. For experimentations, we hid some actual interactions and found that Robust-SEAL restores missing interactions better than other GNN-based baselines.

#14 Fair Sampling in Diffusion Models through Switching Mechanism [PDF] [Copy] [Kimi]

Authors: Yujin Choi ; Jinseong Park ; Hoki Kim ; Jaewook Lee ; Saerom Park

Diffusion models have shown their effectiveness in generation tasks by well-approximating the underlying probability distribution. However, diffusion models are known to suffer from an amplified inherent bias from the training data in terms of fairness. While the sampling process of diffusion models can be controlled by conditional guidance, previous works have attempted to find empirical guidance to achieve quantitative fairness. To address this limitation, we propose a fairness-aware sampling method called \textit{attribute switching} mechanism for diffusion models. Without additional training, the proposed sampling can obfuscate sensitive attributes in generated data without relying on classifiers. We mathematically prove and experimentally demonstrate the effectiveness of the proposed method on two key aspects: (i) the generation of fair data and (ii) the preservation of the utility of the generated data.

#15 Arbitrariness and Social Prediction: The Confounding Role of Variance in Fair Classification [PDF] [Copy] [Kimi]

Authors: A. Feder Cooper ; Katherine Lee ; Madiha Zahrah Choksi ; Solon Barocas ; Christopher De Sa ; James Grimmelmann ; Jon Kleinberg ; Siddhartha Sen ; Baobao Zhang

Variance in predictions across different trained models is a significant, under-explored source of error in fair binary classification. In practice, the variance on some data examples is so large that decisions can be effectively arbitrary. To investigate this problem, we take an experimental approach and make four overarching contributions. We: 1) Define a metric called self-consistency, derived from variance, which we use as a proxy for measuring and reducing arbitrariness; 2) Develop an ensembling algorithm that abstains from classification when a prediction would be arbitrary; 3) Conduct the largest to-date empirical study of the role of variance (vis-a-vis self-consistency and arbitrariness) in fair binary classification; and, 4) Release a toolkit that makes the US Home Mortgage Disclosure Act (HMDA) datasets easily usable for future research. Altogether, our experiments reveal shocking insights about the reliability of conclusions on benchmark datasets. Most fair binary classification benchmarks are close-to-fair when taking into account the amount of arbitrariness present in predictions -- before we even try to apply any fairness interventions. This finding calls into question the practical utility of common algorithmic fairness methods, and in turn suggests that we should reconsider how we choose to measure fairness in binary classification.

#16 Finding ε and δ of Traditional Disclosure Control Systems [PDF] [Copy] [Kimi]

Authors: Saswat Das ; Keyu Zhu ; Christine Task ; Pascal Van Hentenryck ; Ferdinando Fioretto

This paper analyzes the privacy of traditional Statistical Disclosure Control (SDC) systems under a differential privacy interpretation. SDCs, such as cell suppression and swapping, promise to safeguard the confidentiality of data and are routinely adopted in data analyses with profound societal and economic impacts. Through a formal analysis and empirical evaluation of demographic data from real households in the U.S., the paper shows that widely adopted SDC systems not only induce vastly larger privacy losses than classical differential privacy mechanisms, but, they may also come at a cost of larger accuracy and fairness.

#17 MedAlign: A Clinician-Generated Dataset for Instruction Following with Electronic Medical Records [PDF] [Copy] [Kimi]

Authors: Scott L. Fleming ; Alejandro Lozano ; William J. Haberkorn ; Jenelle A. Jindal ; Eduardo Reis ; Rahul Thapa ; Louis Blankemeier ; Julian Z. Genkins ; Ethan Steinberg ; Ashwin Nayak ; Birju Patel ; Chia-Chun Chiang ; Alison Callahan ; Zepeng Huo ; Sergios Gatidis ; Scott Adams ; Oluseyi Fayanju ; Shreya J. Shah ; Thomas Savage ; Ethan Goh ; Akshay S. Chaudhari ; Nima Aghaeepour ; Christopher Sharp ; Michael A. Pfeffer ; Percy Liang ; Jonathan H. Chen ; Keith E. Morse ; Emma P. Brunskill ; Jason A. Fries ; Nigam H. Shah

The ability of large language models (LLMs) to follow natural language instructions with human-level fluency suggests many opportunities in healthcare to reduce administrative burden and improve quality of care. However, evaluating LLMs on realistic text generation tasks for healthcare remains challenging. Existing question answering datasets for electronic health record (EHR) data fail to capture the complexity of information needs and documentation burdens experienced by clinicians. To address these challenges, we introduce MedAlign, a benchmark dataset of 983 natural language instructions for EHR data. MedAlign is curated by 15 clinicians (7 specialities), includes clinician-written reference responses for 303 instructions, and provides 276 longitudinal EHRs for grounding instruction-response pairs. We used MedAlign to evaluate 6 general domain LLMs, having clinicians rank the accuracy and quality of each LLM response. We found high error rates, ranging from 35% (GPT-4) to 68% (MPT-7B-Instruct), and 8.3% drop in accuracy moving from 32k to 2k context lengths for GPT-4. Finally, we report correlations between clinician rankings and automated natural language generation metrics as a way to rank LLMs without human review. We make MedAlign available under a research data use agreement to enable LLM evaluations on tasks aligned with clinician needs and preferences.

#18 CLIPSyntel: CLIP and LLM Synergy for Multimodal Question Summarization in Healthcare [PDF] [Copy] [Kimi]

Authors: Akash Ghosh ; Arkadeep Acharya ; Raghav Jain ; Sriparna Saha ; Aman Chadha ; Setu Sinha

In the era of modern healthcare, swiftly generating medical question summaries is crucial for informed and timely patient care. Despite the increasing complexity and volume of medical data, existing studies have focused solely on text-based summarization, neglecting the integration of visual information. Recognizing the untapped potential of combining textual queries with visual representations of medical conditions, we introduce the Multimodal Medical Question Summarization (MMQS) Dataset. This dataset, a major contribution of our work, pairs medical queries with visual aids, facilitating a richer and more nuanced understanding of patient needs. We also propose a framework, utilizing the power of Contrastive Language Image Pretraining(CLIP) and Large Language Models(LLMs), consisting of four modules that identify medical disorders, generate relevant context, filter medical concepts, and craft visually aware summaries. Our comprehensive framework harnesses the power of CLIP, a multimodal foundation model, and various general-purpose LLMs, comprising four main modules: the medical disorder identification module, the relevant context generation module, the context filtration module for distilling relevant medical concepts and knowledge, and finally, a general-purpose LLM to generate visually aware medical question summaries. Leveraging our MMQS dataset, we showcase how visual cues from images enhance the generation of medically nuanced summaries. This multimodal approach not only enhances the decision-making process in healthcare but also fosters a more nuanced understanding of patient queries, laying the groundwork for future research in personalized and responsive medical care. Disclaimer: The article features graphic medical imagery, a result of the subject's inherent requirements.

#19 Benchmarking Cyber Harassment Dialogue Comprehension through Emotion-Informed Manifestations-Determinants Demarcation [PDF] [Copy] [Kimi]

Authors: Soumitra Ghosh ; Gopendra Vikram Singh ; Jashn Arora ; Asif Ekbal

In the digital age, cybercrimes, particularly cyber harassment, have become pressing issues, targeting vulnerable individuals like children, teenagers, and women. Understanding the experiences and needs of the victims is crucial for effective support and intervention. Online conversations between victims and virtual harassment counselors (chatbots) offer valuable insights into cyber harassment manifestations (CHMs) and determinants (CHDs). However, the distinction between CHMs and CHDs remains unclear. This research is the first to introduce concrete definitions for CHMs and CHDs, investigating their distinction through automated methods to enable efficient cyber-harassment dialogue comprehension. We present a novel dataset, Cyber-MaD that contains Cyber harassment dialogues manually annotated with Manifestations and Determinants. Additionally, we design an Emotion-informed Contextual Dual attention Convolution Transformer (E-ConDuCT) framework to extract CHMs and CHDs from cyber harassment dialogues. The framework primarily: a) utilizes inherent emotion features through adjective-noun pairs modeled by an autoencoder, b) employs a unique Contextual Dual attention Convolution Transformer to learn contextual insights; and c) incorporates a demarcation module leveraging task-specific emotional knowledge and a discriminator loss function to differentiate manifestations and determinants. E-ConDuCT outperforms the state-of-the-art systems on the Cyber-MaD corpus, showcasing its potential in the extraction of CHMs and CHDs. Furthermore, its robustness is demonstrated on the emotion cause extraction task using the CARES_CEASE-v2.0 dataset of suicide notes, confirming its efficacy across diverse cause extraction objectives. Access the code and data at 1. https://www.iitp.ac.in/~ai-nlp-ml/resources.html#E-ConDuCT-on-Cyber-MaD, 2. https://github.com/Soumitra816/Manifestations-Determinants.

#20 Grey-Box Bayesian Optimization for Sensor Placement in Assisted Living Environments [PDF] [Copy] [Kimi]

Authors: Shadan Golestan ; Omid Ardakanian ; Pierre Boulanger

Optimizing the configuration and placement of sensors is crucial for reliable fall detection, indoor localization, and activity recognition in assisted living spaces. We propose a novel, sample-efficient approach to find a high-quality sensor placement in an arbitrary indoor space based on grey-box Bayesian optimization and simulation-based evaluation. Our key technical contribution lies in capturing domain-specific knowledge about the spatial distribution of activities and incorporating it into the iterative selection of query points in Bayesian optimization. Considering two simulated indoor environments and a real-world dataset containing human activities and sensor triggers, we show that our proposed method performs better compared to state-of-the-art black-box optimization techniques in identifying high-quality sensor placements, leading to an accurate activity recognition model in terms of F1-score, while also requiring a significantly lower (51.3% on average) number of expensive function queries.

#21 Federated Learning via Input-Output Collaborative Distillation [PDF] [Copy] [Kimi]

Authors: Xuan Gong ; Shanglin Li ; Yuxiang Bao ; Barry Yao ; Yawen Huang ; Ziyan Wu ; Baochang Zhang ; Yefeng Zheng ; David Doermann

Federated learning (FL) is a machine learning paradigm in which distributed local nodes collaboratively train a central model without sharing individually held private data. Existing FL methods either iteratively share local model parameters or deploy co-distillation. However, the former is highly susceptible to private data leakage, and the latter design relies on the prerequisites of task-relevant real data. Instead, we propose a data-free FL framework based on local-to-central collaborative distillation with direct input and output space exploitation. Our design eliminates any requirement of recursive local parameter exchange or auxiliary task-relevant data to transfer knowledge, thereby giving direct privacy control to local users. In particular, to cope with the inherent data heterogeneity across locals, our technique learns to distill input on which each local model produces consensual yet unique results to represent each expertise. Our proposed FL framework achieves notable privacy-utility trade-offs with extensive experiments on image classification and segmentation tasks under various real-world heterogeneous federated learning settings on both natural and medical images. Code is available at https://github.com/lsl001006/FedIOD.

#22 Scaling Up Pareto Optimization for Tree Structures with Affine Transformations: Evaluating Hybrid Floating Solar-Hydropower Systems in the Amazon [PDF] [Copy] [Kimi]

Authors: Marc Grimson ; Rafael Almeida ; Qinru Shi ; Yiwei Bai ; Héctor Angarita ; Felipe Siqueira Pacheco ; Rafael Schmitt ; Alexander Flecker ; Carla P. Gomes

Sustainability challenges inherently involve the consideration of multiple competing objectives. The Pareto frontier – the set of all optimal solutions that cannot be improved with respect to one objective without negatively affecting another – is a crucial decision-making tool for navigating sustainability challenges as it highlights the inherent trade-offs among conflicting objectives. Our research is motivated by the strategic planning of hydropower in the Amazon basin, one of the earth’s largest and most biodiverse river systems, where the need to increase energy production coincides with the pressing requirement of minimizing detrimental environmental impacts. We investigate an innovative strategy that pairs hydropower with Floating Photovoltaic Solar Panels (FPV). We provide a new extended multi-tree network formulation, which enables the consideration of multiple dam configurations. To address the computational challenge of scaling up the Pareto optimization framework to tackle multiple objectives across the entire Amazon basin, we further enhance the state-of-the-art algorithm for Pareto frontiers in tree-structured networks with two improvements. We introduce affine transformations induced by the sub-frontiers to compute Pareto dominance and provide strategies for merging sub-trees, significantly increasing the pruning of dominated solutions. Our experiments demonstrate considerable speedups, in some cases by more than an order of magnitude, while maintaining optimality guarantees, thus allowing us to more effectively approximate the Pareto frontiers. Moreover, our findings suggest significant shifts towards higher energy values in the Pareto frontier when pairing hybrid hydropower with FPV solutions, potentially amplifying energy production while mitigating adverse impacts.

#23 Fair Multivariate Adaptive Regression Splines for Ensuring Equity and Transparency [PDF] [Copy] [Kimi]

Authors: Parian Haghighat ; Denisa Gándara ; Lulu Kang ; Hadis Anahideh

Predictive analytics has been widely used in various domains, including education, to inform decision-making and improve outcomes. However, many predictive models are proprietary and inaccessible for evaluation or modification by researchers and practitioners, limiting their accountability and ethical design. Moreover, predictive models are often opaque and incomprehensible to the officials who use them, reducing their trust and utility. Furthermore, predictive models may introduce or exacerbate bias and inequity, as they have done in many sectors of society. Therefore, there is a need for transparent, interpretable, and fair predictive models that can be easily adopted and adapted by different stakeholders. In this paper, we propose a fair predictive model based on multivariate adaptive regression splines (MARS) that incorporates fairness measures in the learning process. MARS is a non-parametric regression model that performs feature selection, handles non-linear relationships, generates interpretable decision rules, and derives optimal splitting criteria on the variables. Specifically, we integrate fairness into the knot optimization algorithm and provide theoretical and empirical evidence of how it results in a fair knot placement. We apply our fairMARS model to real-world data and demonstrate its effectiveness in terms of accuracy and equity. Our paper contributes to the advancement of responsible and ethical predictive analytics for social good.

#24 Fair Graph Learning Using Constraint-Aware Priority Adjustment and Graph Masking in River Networks [PDF] [Copy] [Kimi]

Authors: Erhu He ; Yiqun Xie ; Alexander Sun ; Jacob Zwart ; Jie Yang ; Zhenong Jin ; Yang Wang ; Hassan Karimi ; Xiaowei Jia

Accurate prediction of water quality and quantity is crucial for sustainable development and human well-being. However, existing data-driven methods often suffer from spatial biases in model performance due to heterogeneous data, limited observations, and noisy sensor data. To overcome these challenges, we propose Fair-Graph, a novel graph-based recurrent neural network that leverages interrelated knowledge from multiple rivers to predict water flow and temperature within large-scale stream networks. Additionally, we introduce node-specific graph masks for information aggregation and adaptation to enhance prediction over heterogeneous river segments. To reduce performance disparities across river segments, we introduce a centralized coordination strategy that adjusts training priorities for segments. We evaluate the prediction of water temperature within the Delaware River Basin, and the prediction of streamflow using simulated data from U.S. National Water Model in the Houston River network. The results showcase improvements in predictive performance and highlight the proposed model's ability to maintain spatial fairness over different river segments.

#25 Multi-Modal Discussion Transformer: Integrating Text, Images and Graph Transformers to Detect Hate Speech on Social Media [PDF] [Copy] [Kimi]

Authors: Liam Hebert ; Gaurav Sahu ; Yuxuan Guo ; Nanda Kishore Sreenivas ; Lukasz Golab ; Robin Cohen

We present the Multi-Modal Discussion Transformer (mDT), a novel method for detecting hate speech on online social networks such as Reddit discussions. In contrast to traditional comment-only methods, our approach to labelling a comment as hate speech involves a holistic analysis of text and images grounded in the discussion context. This is done by leveraging graph transformers to capture the contextual relationships in the discussion surrounding a comment and grounding the interwoven fusion layers that combine text and image embeddings instead of processing modalities separately. To evaluate our work, we present a new dataset, HatefulDiscussions, comprising complete multi-modal discussions from multiple online communities on Reddit. We compare the performance of our model to baselines that only process individual comments and conduct extensive ablation studies.